UNCLASSIFIED 


AD  NUMBER 

ADB043952 

LIMITATION  CHANGES 
TO: 

Approved  for  public  release;  distribution  is 
unlimited. 


FROM: 

Distribution  authorized  to  U.S.  Gov't,  agencies 
only;  Test  and  Evaluation;  DEC  1979.  Other 
requests  shall  be  referred  to  Rome  Air 
Development  Center , Attn:  RADC/ISFA,  Griff iss 
AFB,  NY  13441. 


AUTHORITY 

RADC  ltr  dtd  11  Mar  1982 


THIS  PAGE  IS  UNCLASSIFIED 


THIS  REPORT  HAS  BEEN  DELIMITED 
AND  CLEARED  FOR  PUBLIC  RELEASE 
UNDER  DO-!'  DIRECTIVE  5200,20  AND 
NO  RESTRICTIONS  ARE  IMPOSED  UPON 
ITS  USE  AND  DISCLOSURE, 

DISTRIBUTION  STATEMENT  A 

APPROVED  FOR  PUBLIC  RELEASE; 
DISTRIBUTION  UNLIMITED, 


ADB043952 


In-House  Report 
December  1979 


< 


SOURCE  DATA  AUTOMATION 
TECHNIQUES  STUDY  FOR  AFMPC 


Leon  McDowell 

i 


i 

i 


D D C 

r?mi?nn  nr? 

FEB  8 I960 

ET5EITO 

Br 


Q- 

O 

C3 

la-1 

u- 


DISTRIBUYION  LIMITED  TO  U S.  GOVERNMENT  AGENCIES  ONLY;  TEST 
AND  EVALUATION;  Dec  1979  . OTHER  REQUESTS  FOR  THIS  DOCUMENT 

MUST  BE  REFERRED  TO  RADC  ( !SFA  ),  GRIFFISS  AFB  NY  13441 . 


ROME  AIR  DEVELOPMENT  CENTER 

Air  Force  Systems  Command 

Griffiss  Air  Force  Base,  New  York  13441 


^ *■ 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  of  THIS  PAGE  (When  Data  Entered) 


REPORT  DOCUMENTATION  PAGE 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


I 


\ 


f 


Rome  Air  Development  Center  (ISFA) 
Griffiss  AFB  NY  13441 


11.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 

Rome  Air  Development  Center  (ISFA) 
Griffiss  AFB  NY  13441 


W- 


I?  nrrnriT  rnn*  — • 

DeceaiMr  *979 


.J 


13.  NUMBER  OF  PAGES 

89 


14.  MONITORING  AGENCY  NAME  & tnnpFSVI>  ditlerentJrom  Controlllnt  Ottice) 

Same  rfj  ) 5 i 


IS.  SECURITY  CLASS,  (ot  this  report) 

UNCLASSIFIED 


ISa  OECL  ASSI  FI  CATION 'DOWNGRADING 

n/Ascheoule 


16.  DISTRIBUTION  STATEMENT  (ot  thle  Report) 


Distribution  limited  to  U.  S.  Government  agencies;  test  and  evaluation; 
December  1979.  Other  requests  for  this  document  must  be  referred  to 
RADC  (ISFA),  Griffiss  AFB  NY  13441. 


17.  DISTRIBUTION  STATEMENT  (ot  fhe  abstract  entered  In  Block  20,  It  dliterent  trom  Report) 

Same 


D D C 


18.  SUPPLEMENTARY  NOTES 

None 


ir?ror?nn  nre 

FEB  8 1980 


19.  KEY  WORDS  (Continue  on  reverse  side  It  necessary  and  Identify  by  block  number) 

Source  Data  Automation 
Source  Data  Entry 
Data  Entry 
Data  Input 


trasrarra 


B 


20.  ABSTRACT  (Confinu*  on  reverse  side  It  necessary  and  Identify  by  block  number) 

Source  data  automation  has  often  been  defined  as  the  efficient  capture  of 
data  at  the  source,  in  addition  to  data  handling.  The  speed  with  which  com- 
puters are  capable  of  manipulating  data  and  executing  instruct^  ms  has  increased 
significantly  within  this  decade.  However,  the  ability  to  input  data  to  the 
computer  has  always  been  an  area  of  concern.  As  a result,  this  deficiency  has 
limited  the  potential  throughput  of  current  high-speed  systems.  The  speed 
differential  between  manual  data  entry  and  computer  processing  has  necessitated 
the  development  of  source  data  automation  techniques  and  devices  in  an  attempt. — 


DD  ,^nRM7,  1473 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  flWian  Data  Entered) 


J|* 

I 


9 o5° 


% . A 

,A„-  - -v  ' 


m 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  TrflS  PAGEftFhMi  D»r»  Entered) 


-to  solve  the  input/output  problem. 

The  spectrum  of  data  entry  techniques  consists  of  the  keypunch  operatic 

at  one  end,  followed  by  keyboard  entry  devices  such  as  key-to-tape,  key-to- 
disc  and  the  CRT  terminal;  then  the  optical  readers  with  voice  input  at  the 
other  end.  A full  progression  from  one  end  of  the  spectrum  to  the  other  has 
not  been  fully  achieved  at  this  time.  A few  applications  still  use  the  key- 
punch operation.  However,  it  was  forced  to  give  way  to  the  more  efficient 
key-to-storage  systems.  Today,  optical  recognition  technology  is  assuming  a 
greater  role  in  data  entry.  Voice  input  technology  has  progressed  steadily 
in  this  decade.  However,  it  is  still  largely  developmental,  and  its  use  in 
data  entry  applications  is  somewhat  limited. 

r 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  of  this  PAGEf"T»«n  Dmtm 


TABLE  OF  CONTENTS 


SECTION  Page 

I INTRODUCTION  1 

II  TECHNOLOGY  DISCUSSION  3 

III  TECHNICAL  DESCRIPTION  5 

A.  Keypunch  5 

B.  Keyboard  Entry  8 

C.  Optical  Readers  17 

D.  Mixed-Media  34 

E.  Voice  Input  35 

IV  SOURCE  DATA  AUTOMATION  EQUIPMENT  PROFILE  39 

V ECONOMIC  CONSIDERATIONS  41 

VI  SYSTEM  CONCEPTS  54 

VII  SUMMARY  AND  CONCLUSIONS  83 

REFERENCES  89 


i 


I 


( 


Fi  gure 


LIST  OF  ILLUSTRATIONS 


Source  Data  Automation  Technology  Spectrum  ..  6 

Basic  Elements  of  a Key-to-Tape  System  10 

Basic  Elements  of  a Key-to-Disc  System  11 

Functional  Diagram  of  a CRT  Terminal  14 

Basic  Elements  of  an  OCR  System  20 

Mechanical  Scanner  with  Rotating  Disc  26 

CRT  Scanner  27 

Scanning  a Character  27 

Laser-Beam  Spinning  Mirror  Scanner  29 

Solid  State  Linear  Array  32 

Solid  State  Area  Array  32 

Cost  Per  Character  vs  Volume  51 


Monthly  Operating  Cost  vs  Volume  53 

Concept  of  Existing  System  57 

Concept  No.  1 60 

Volume  Vs  Transmission  Time  62 

Break-Even  Line  63 

Monthly  Cost  of  Leased  Lines  64 

Concept  No.  2 67 

Concept  No.  3 70 

Cost  Vs  Transmission  Time  (Switched  Lines)  ..  74 

Concept  No.  4 78 

LIST  OF  TABLES 


TABLE 

I 


Keypunch  Cost/Performance  Data  45 

Key-to-Disc  Cost/Performance  Data  47 

OCR  (Single  Font)  Cost  Performance  Data 49 

OCR  (Multifont)  Cost  Performance  Data 50 

Summary  of  Concepts  82 


* 

I 


I.  INTRODUCTION: 


This  report  documents  the  results  of  an  In-House  Study 
in  the  area  of  Source  Data  Automation  Techniques.  The 
requirement  for  this  study  is  contained  in  Program  Manage- 
ment Directive  (PMD)  R-2144(4),  dated  16  Jun  76.  This  PMD 
covers  program  direction  on  Project  1266  (Program  Element 
64708F) . 

The  purposes  of  this  study  were  to  determine  the  capabil 
ities  which  can  be  provided  by  the  current  state-of-the-art 
in  source  data  automation  and  related  technologies,  and  to 
underscore  trends  in  future  capabilities.  The  study  was 
conducted  in  support  of  the  Air  Force  Manpower  and  Personnel 
Center's  (AFMPC)  Microform  System  located  at  Randolph  AFB 
TX. 

The  AFMPC  maintains  microfiche  records  of  all  active 
duty  Air  Force  personnel.  These  records  are  maintained  by 
keystroking  input  data  into  the  computer-controlled  micro- 
fiche records  retrieval  system.  This  input  data  is  com- 
prised of  typewritten  Air  Force  and  Department  of  Defense 
documents  which  are  created  at  base  level  and  forwarded  to 
AFMPC  via  the  Postal  Service.  The  AFMPC  is  interested  in 
exploring  the  technical  and  economic  feasibility  of  incor- 
porating source  data  automation  techniques  into  the  system. 
It  is  anticipated  that  the  data  presented  in  this  study  will 


aid  the  AFMPC  in  assessing  the  cost/effectiveness  of  auto- 
mating specific  system  functions  and  in  establishing  reason 
able  estimates  in  the  areas  of  performance  and  cost  for 
future  planning  purposes. 

Data  for  the  study  was  gathered  from  the  following 
sources : 

Literature  Search  - An  extensive  literature  search  was 
conducted  and  numerous  technical  articles  from  the  various 
trade  journals  were  obtained,  along  with  many  technical 
reports  from  commercial  and  military  agencies. 

Technical  Survey  - A survey  was  conducted  in  which  a 
form  letter  was  distributed  among  companies  considered 
knowledgeable  in  any  of  the  technologies  related  to  source 
data  automation,  particularly  Optical  Character  Recognition 

Travel  - Selected  agencies  and  companies  were  visited 
to  obtain  data  as  well  as  discuss  some  of  the  critical 
aspects  of  this  technology.  The  Optical  Character  Recogni- 
tion (OCR)  Users  Conference  was  attended  in  Hershey,  PA 
during  the  summer  of  1978.  At  that  time,  very  valuable 
literature  was  obtained  on  existing  OCR  equipment  and  tech- 
niques . 

RADC  Programs  - RADC/IRA,  under  the  direction  of 
Dr.  Bruno  8eek,  has  been  involved  in  the  development  of 
voice  input  technology  for  many  years.  Numerous  technical 
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reports  on  the  subject  were  acquired  from  that  office  in 
addition  to  technical  papers  written'by  IRA  experts  and  others 
working  in  the  area. 

II . TECHNOLOGY  DISCUSSION: 

The  electronic  data  processing  industry  has  made  signi- 
ficant increases  in  the  internal  data  processing  capabilities 
of  the  computer;  however,  it  has  failed  to  develop  input/ 
output  techniques  and  equipment  to  the  same  degree  of  effi- 
ciency. As  a result,  this  deficiency  has  limited  the  poten- 
tial throughput  of  current  high-speed  systems.  The  speed 
differential  between  manual  data  entry  (such  as  keypunching) 
and  computer  processing  has  necessitated  the  development  of 
source  data  automation  techniques  and  devices  in  an  attempt 
to  solve  the  input/output  problem. 

Source  data  automation  has  often  been  defined  as  the 
efficient  capture  of  data  at  the  source,  in  addition  to 
data  handling.  It  has  been  the  subject  of  electronic  data 
processing  experts  for  many  years.  The  speed  with  which 
computers  are  capable  of  manipulating  data  and  executing 
instructions  has  increased  significantly  within  this  decade. 
However,  the  ability  to  Input  data  to  the  computer  has 
always  been  an  area  of  concern. 

The  spectrum  of  data  entry  techniques  consist  of  the 
"old"  keypunch  operation  at  one  end,  followed  by  keyboard 
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entry  devices  and  optical  readers  with  voice  input  at  the  other 
end.  A full  progression  from  one  end  of  the  spectrum  to  the 
other  has  not  been  fully  achieved  at  this  time.  A few  appli- 
cations still  use  the  keypunch  operation.  However,  it  was 
forced  to  give  way  to  the  more  efficient  key-to-storage 
systems.  Today,  optical  recognition  technology  is  assuming 
a greater  role  in  data  entry.  Voice  input  technology  has 
progressed  steadily  in  this  decade.  However,  it  is  still 
largely  developmental,  and  its  use  in  data  entry  applications 
is  somewhat  limited. 

The  evolution  of  source  data  automation  concepts  usually 
begins  with  a discussion  of  source  data  capture  in  a form 
not  suitable  for  direct  entry  into  the  computer  thus  requiring 
a transcription  process  for  data  entry.  If  data  is  captured  at 
a remote  location,  the  method  of  transfer  to  the  processing 
site  also  becomes  important.  Therefore,  source  data  automa- 
tion may  be  discussed  in  terms  of  both  the  technique  used  in 
source  data  capture  and  the  method  of  data  transfer  from  the 
point  of  capture  to  the  processing  location. 

Reference  1 estimates  that  25  to  50  percent  of  all  data 
processing  operating  costs  are  consumed  by  the  data  entry 
process  with  the  largest  cost  fector  being  personnel  such 
as  keypunch  operators.  Technological  developments  over  the 
past  15  years  have  led  to  more  sophisticated  data  entry 


systems  designed  to  capture  data  more  efficiently  and  economi- 
cally than  keypunching.  Key-tc-storage  systems  permit  faster 
operator  entry  while  performing  preprocessing  activities  such 
as  data  format  and  verification  checks.  These  data  entry 
systems  however  still  require  operators  and  the  associated 
labor  cost.  This  reference,  like  others  written  on  the  subject 


considers  the  key  to  efficient  and  economical  data  entry  lies 
in  the  elimination  of  the  data  transcription  step  with  its 
associated  labor  cost  and  its  replacement  with  direct  data 


form  either  in  its  original  format  or  on  special  punched 
card  transcription  forms.  The  keypunch  operator  must  then 
punch  the  cards  in  a specific  format.  To  assist  the  opera- 
tor, a control  card  is  initially  made  up  to  prescribe  the 
fields  to  be  punched  and  skipped.  To  assure  accuracy,  the 
data  is  usually  entered  a second  time  by  a verifier  to  check 
for  errors.  With  newer  buffered  keypunch  machines,  the 
keying  operation  is  faster  and  limited  card  edit  checks  are 
performed . 

From  a system  point  of  view,  the  keypunch  is  a 
very  slow  means  of  data  entry.  It  requires  a rigid  format 
and  is  costly  when  used  in  a mul ti - keypunch  installation. 
However,  established  usage  of  the  unit  record  concept  in 
existing  system  designs,  the  ease  of  punching  small  jobs 
and  the  ease  of  data  Insertion  will  keep  the  keypunch  In 
operation.  Reference  2 states  that  modern  day  keypunches 
which  are  buffered  can  be  used  for  both  punching  and  verify- 
ing. They  are  available  In  both  the  Interpreting  models 
(which  print  along  the  top  edge  of  the  card  while  punching) 
and  non-interpreting  models.  There  are  80  column  keypunches 
supplied  by  Decision  Data,  IBM  and  Univac.  IBM  and  Decision 
Data  also  make  the  96-column  keypunch.  However,  this  refer- 
ence suggests  that  this  format  Is  near  extinction  and  as  such 
does  not  recommend  acquiring  one. 
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The  conditions  which  tend  to  favor  the  selection  of 
a keypunch  for  data  entry  are: 

(1)  The  need  for  a small  number  of  data  entry 

s tati ons . 

(2)  The  need  for  a relatively  small  number  of 
program  formats  (10  or  less)- 

(3)  The  ability  to  work  effectively  with  80  to  96 
character  records. 

(4)  The  availability  of  strong  editing  procedures 
in  the  central  computer  for  subsequent  discovery  of  errors 
rather  than  relying  on  error  discovery  during  the  data  entry 
process . 

(5)  The  absence  of  a need  for  rapid  and  systematic 
searching  of  the  recorded  data  records. 

(6)  The  absence  of  a need  for  Immediate  printout. 
These  conditions  suggests  that  if  the  data  entry 

activity  is  relatively  straight  forward  and  not  overly  volumi- 
nous, then  the  keypunch  might  well  be  the  most  cost-effective 
data  entry  method. 

B.  KEYBOARD  ENTRY: 

The  need  to  Increase  the  efficiency  of  the  key- 
punch operation  and  lower  data  entry  cost  without  changing 
data  preparation  procedures  led  to  the  development  of  key- 
punch replacement  equipment.  Key-to-storage  and  CRT  terminal 
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systems  are  the  most  widely  used  methods  of  keyboard  data 
entry . 

1.  The  Key- to-Storage  System  which  includes  both 
Key-to-Tape  and  Key-to-Disc  Systems  provides  a more  efficient 
means  of  data  entry  as  compared  to  the  keypunch  system. 

The  Key-to-Tape  System  implies  stand  alone  re- 
placement of  keypunch  units.  The  basic  elements  of  a key- 
to-tape  system,  as  noted  in  Figure  2,  are  the  keyboard,  a 
tape  transport,  a buffer  and  associated  logic.  Some  units 
use  cassettes  or  cartridges  as  an  intermediate  medium  but 
generally  data  is  stored  on  half-inch  computer  compatible 
tape.  The  only  essential  difference  between  a stand  alone 
key-to-tape  and  a keypunch  unit  is  that  the  data  is  placed 
onto  magnetic  tape  rather  than  punched  cards.  In  order  to 
reduce  tape  setup  time,  the  tapes  produced  from  a number  of 
units  are  merged  Into  a single  tape  by  a pooler. 

The  Key-to-Disc  System  is  an  extension  of  the 
key-to-tape  system  In  which  the  processor  is  enhanced  by 
the  addition  of  disk  memory.  As  noted  in  Figure  3,  each 
system  consists  of  multiple  keyboard  terminals,  a supervisory 
terminal,  a small  computer,  a magnetic  disc  and  a tape  drive. 
The  mul tl -keyboard  stations  connect  to  a shared  processor. 

The  disc  memory  contains  the  program  library  and  operating 
system  routines,  and  serves  as  a mass  storage  area  for 
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keyboard  station  input.  After  processing  and  formatting, 
the  data  is  transferred  from  the  disc  to  the  magnetic  tape. 
Because  of  the  increased  power  of  the  key-to-disc  processor, 
more  sophisticated  error  checks  and  editing  is  performed. 
Reference  3 estimates  that  key-to-storage  systems  offer  a 
20  to  40  percent  increase  in  data  entry  throughput  with  a 
substantial  reduction  in  user  costs.  Depending  on  the 
system  and  the  user,  efficiency  is  obtained  through  increased 
operator  speed,  reduced  error  correction  costs,  increased 
flexibility,  minimized  overtime  operation  and  the  elimination 
of  card  and  storage  cost. 

The  shared  processor  systems  offer  both  a price 
and  speed  advantage  over  most  stand  alone  units  provided  that 
a number  of  operators  are  required.  Since  several  keyboards 
share  a processor,  intermediate  storage  devices  and  tape 
drives,  the  cost  of  having  a tape  drive  and  control  unit  for 
each  keyboard  is  eliminated,  and  the  cost  of  a larger,  more 
versatile  processor  and  an  intermediate  storage  device  is 
substituted.  Thus,  for  a small  number  of  keyboards  such  a 
system  would  be  more  expensive;  for  a larger  number,  the 
price  per  keyboard  drops  as  the  number  of  keyboards  increase. 
The  break-even  point  varies  with  each  system.  However, 
Reference  4 suggested  that  at  least  six  to  eight  keystations 
are  required  for  this  concept  to  be  cost  competitive  with 
other  methods  of  data  entry. 


2.  CRT  Terminal  - The  CRT  Display  Terminal  is  an 
on-line  input/output  device  designed  for  two-way  transmission 
of  data  with  a control  computer.  Figure  4 depicts  the  func- 
tional units  of  a CRT  Terminal,  which  are  the  CRT  Display 
Screen  coupled  with  a keyboard,  a buffer  memory  for  display 
refresh,  control  electronics,  character  generation  and  trans- 
mission interface. 

The  CRT  Display  is  used  to  show  input  and  out- 
put data.  Therefore,  one  of  the  most  important  features  is 
the  screen  capacity.  Typical  units  are  capable  of  display- 
ing up  to  approximately  1,920  characters  in  a single  display 
in  a format  of  80  cha racters per  line  and  24  lines  per  dis- 
play. A cursor  typically  is  used  on  the  screen  to  show 
where  the  next  character  will  appear.  When  the  terminal  is 
under  the  control  of  the  operator,  the  cursor  can  be  moved 
to  any  location  on  the  screen  and  thus  new  information  can 
be  entered  or  old  information  deleted,  and/or  updated.  The 
edit  functions  typically  include:  erase  character,  erase 

to  end  of  line,  erase  to  end  of  message,  erase  to  end  of 
display,  insert  character  In  line  and  display,  delete  character 
in  line  and  display;  and  insert  and  delete  line.  The  key- 
board is  usually  a standard  typewriter  type  keyboard  with 
cursor  control  key  and  some  additional  special  function  keys 
provided  to  Increase  flexibility.  The  key  cap  explains  the 
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function  performed  by  the  system  when  it  receives  the  unique 
code  that  is  generated  by  that  particular  function  key. 

The  refresh  memory  is  usually  large  enough  to 
store  one  "page"  of  data  or  a full-screen.  However,  a concept 
known  as  paging  allows  many  more  lines  of  information  to  be 
stored  in  the  memory  than  can  be  displayed  on  the  screen  at 
any  one  time.  Display  Terminals  with  this  feature  require 
additional  local  memory  for  storing  extra  lines  or  pages  of 
non-displayed  data.  The  stored  data  can  be  displayed  on 
command,  a page  at  a time,  or  rolled  back  and  forth  in  front 
of  the  display  window  (viewing  area).  Although  a listing  on 
a CRT  is  normally  limited  to  24  lines,  "paging"  permits  data 
lines  well  in  excess  of  24  to  be  stored  and  displayed. 

The  control  electronics  provide  the  necessary 
sequence,  timing  and  operational  signals  to  let  the  display 
unit  perform  its  various  functions. 

The  character  generator  accepts  coded  characters 
(typically  ASCII)  from  the  computer  and  keyboard,  and  con- 
verts them  to  a pattern  that  can  be  "drawn"  on  the  face  of 
the  CRT  Display.  Typical  units  are  capable  of  generating  up 
to  64  different  alphanumeric  characters  including  upper  and 
lower  case  alphabets,  the  standard  numerals  and  other  special 
symbols . 


The  transmission  interface  connects  the  display 
unit  to  the  communications  computer  system.  It  usually  con- 
forms to  the  ASCII  code  and  discipline,  meets  the  electrical 
and  logical  requirements  of  the  standard  EIA  R$  232  specifi- 
cation, and  connects  to  a modem  or  acoustic  coupler  at  speeds 
between  110  and  9,600  bits  per  second.  In  some  cases  where 
the  termi nal -to-computer  distance  is  not  too  great,  the 
modem  can  be  eliminated  and  the  terminal  can  be  connected  to 
the  computer  directly. 

CRT  Display  Terminals  used  on  line  for  data 
entry  provide  a direct  man-machine  interface.  With  the  appro- 
priate systems  software,  the  computer  can  prompt  the  operator 
with  request  or  replies,  and  can  detect  errors  during  data 
entry.  The  operator  can  usually  verify  the  data  on  the 
screen  and  therefore  reduce  significantly  the  input  error 
rate  problem.  Therefore,  the  use  of  a CRT  Display  consider- 
ably eases  data  editing  and  correction.  Arguments  against 
displays,  such  as  their  not  being  necessary  to  operators  for 
keying,  would  be  valid  if  everything  were  entered  perfectly. 

In  the  real  world,  however,  where  errors  occur  and  corrections 
are  needed  during  entry  and  verification,  the  display  becomes 
a valuable  aid.  The  existence  of  a display  alone  does  not 
assure  efficient  operation.  The  data  must  be  presented  in 
an  easily  readable  format  that  defines  data  fields.  For 
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purposes  of  controlling  operator  entry  of  data,  the  software 
system  can  be  designed  to  present  formatted  displays  with 
predetermined  fields  for  the  operator  entry  of  data.  Data 
generated  by  the  system  would  be  categorized  as  a protected 
field  (operator  cannot  change  data).  The  data  entered  by  the 
operator  in  the  appropriate  fields  (operator  can  enter/change 
data)  is  unprotected.  During  transmission,  only  the  unpro- 
tected fields  are  transmitted. 

C.  OPTICAL  READERS: 

Optical  Readers  are  devices  which  employ  a scanning 
process  to  search  a document  for  marks,  an  array  of  code  bars 
or  alphanumeric  characters  and  converts  the  optical  impulses 
obtained  via  reflected  or  transmitted  light  to  an  appropriate 
electrical  signal  for  further  processing.  As  suggested  above. 
Optical  Readers  can  be  categorized  by  the  type  of  symbol  read. 
The  three  principal  groupings  are  Optical  Mark  Readers,  Opti- 
cal Bar  Code  Readers  and  Optical  Character  Readers.  The  Mark 
and  Bar  Code  Readers  correlate  the  position  of  marks  and  code 
bars  respectively,  while  the  character  reader  Identifies  each 
character  by  comparing  Its  features  with  those  of  pre-stored 
characters . 

I.  Optical  Mark  Reader  - The  Optical  Mark  Reader 
Is  the  simplest  type  of  optical  reader  and  Is  used  primarily 
for  test  scoring.  Inventory  control,  data  collection  or  other 
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"mark  the  correct  box"  applications.  Some  of  the  mark  readers 
handle  conventional  80-column  cards  whtle  others  are  capable 
of  handling  full-page  documents.  They  typically  read  data  in 
one  of  two  ways.  The  first  is  by  interpreting  rows  of  marks 
in  exactly  the  same  way  as  holes  in  a card  are  interpreted. 
Thus,  one  column  of  marks  can  be  used  to  represent  one  charac- 
ter. Some  readers  are  designed  to  interpret  data  coded  on 
conventional  punched  cards  consistent  with  the  Hollerith  code. 
The  second  technique  used  by  mark  readers  transmits  a binary 
image  of  the  marks  to  the  computer  which  is  then  interpreted 
by  software.  They  may  interface  directly  or  over  a communica- 
tions line  with  a computer.  They  may  also  operate  off-line, 
feeding  data  onto  magnetic  or  paper  tape  for  future  computer 
entry . 

2.  Optical  Bar  Code  Reader  - Optical  Bar  Code 
Readers  are  electronic  reading  devices  which  optically  sense 
special  combinations  or  arrangements  of  bars  and  correlates 
these  bars  to  previously  defined  characters.  The  type  of 
code  used  varies,  but  in  most  cases  they  cannot  be  formed  by 
hand  and  are  not  easily  readable  by  humans.  Usually,  special 
devices  are  required  to  produce  the  bar  code  imprinting.  The 
most  widely  used  code  pattern  Is  the  Universal  Product  Code 
(UPC).  Bar  Code  Readers  like  Mark  Readers  may  Interface 
directly  to  computers  or  operate  off-line  with  tape  recorders. 
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3.  Optical  Character  Reader  (OCR)  - These  systems 
are  rapidly  evolving  as  an  economical  means  of  inputting 
data  into  a computer  system.  The  optical  character  reading 
method  employed  in  OCR  systems  is  similar  to  the  reading 
method  used  by  humans.  When  light  is  placed  on  a form  con- 
taining data,  we  search  or  scan  the  form,  and  the  optical 
image  of  the  characters  is  reflected  on  the  retina  of  the 
eye.  These  images  are  transformed  into  nerve  impulses,  and 
transmitted  to  the  brain.  The  brain  has  been  programmed 
through  learning  to  identify  and  recognize  a variety  of 
characters.  Optical  character  readers  employ  a scanning 
mechanism  to  convert  reflected  light  pulses  representing 
particular  characteristics  of  the  character  to  be  read  into 
electrical  pulses.  These  electrical  signals  are  then  used 
to  "recognize"  the  character  by  comparing  these  signals  with 
matching  sets  of  signals  stored  in  memory  in  order  to  deter- 
mine the  identity  of  the  character. 

Figure  5 depicts  the  functional  elements  of  an 
OCR  System.  The  input  documentsare  fed  to  the  scanner  unit 
by  the  feeder  and  transport  after  which  the  data  acquired 
by  the  scanner  is  used  by  the  recognition  unit  to  identify 
the  data.  Depending  upon  the  success  of  the  recognition  task, 
the  feeder  and  transport  unit  sends  the  document  to  either 
the  accept  or  reject  bin  of  the  output  stacker. 
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BASIC  ELEMENTS  OF  AN  OCR  SYSTEM 


INPUT  DOCUMENTS  - The  input  documents  may  be  of 


the  turnaround  type  (utility  bills,  charges,  etc.)  which  are 
printed  or  sometimes  handwritten  data.  The  quality  of  the  input 
document  and  the  data  on  it  are  important  for  proper  operation 
of  an  OCR  machine.  Excellent  reading  performance  can  be 
achieved  when  forms  on  which  data  are  printed  conform  rigidly 
to  machine  specifications.  Reference  1 indicates  that  the 
quality  of  the  input  data  involves  a number  of  factors*.  The 
device  used  to  print  the  data,  whether  it  is  a computer  line 
printer,  a typewriter,  or  a typesetting  machine;  the  print 
ribbon  used  on  typewriters;  the  Ink  used  to  print  the  forms; 
the  paper  and  the  design  of  the  input  documents.  The  printing 
mechanism  is  extremely  important  because  the  OCR  units  become 
Increasingly  expensive  in  proportion  to  their  ability  to  handle 
misregistration  and  blurred  characters,  and  skewd  lines. 
Therefore,  a precisely  printed  document  can  have  a great  effect 
on  the  cost  of  a reader  capable  of  a given  throughput.  Most 
current  readers  require  special  typewriter  and  printer  ribbons 
to  assure  sharp  images  resistant  to  deterioration  and  producing 
minimum  reflectance  on  the  paper.  With  regards  to  forms 
design,  the  most  important  considerations  are  to  consider 
machine  limitations  of  the  reader  to  be  used,  l.e.,  font 
capabilities,  registration  tolerances  and  skew,  and  to  minimize 
space  between  OCR  fields  to  reduce  unnecessary  scanning. 
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Font  types  are  especially  significant  to  the  OCR 
field  since  they  govern  the  nature  of  the  input  medium.  OCR 
readers  fall  into  three  broad  categories.  They  are  the  single- 
font, the  multi -font  and  the omni- font  reader.  A single- font 
reader  "reads"  a single  printed  or  typewritten  font.  A multi- 
font reader  is  capable  of  reading  a variety  of  fonts,  one  at 
a time.  An  equipment  adjustment  is  usually  necessary  in  chang- 
ing from  one  font  to  another.  The  omni-font  reader  is  capable 
of  reading  multiple  fonts  which  are  mixed  together  by  perform- 
ing a character  set  analysis  immediately  prior  to  reading. 

A significant  problem  faced  by  the  multi-font  and 
omni-font  readers  is  the  wide  variation  in  type  sizes  as  well 
as  type  styles.  The  size  of  the  area  that  the  letter  occupies 
must  be  defined  in  order  to  determine  all  aspects  of  an  indivi- 
dual shape.  This  is  difficult  when  type  sizes  vary.  The 
problem  is  also  encountered  by  readers  that  must  deal  with 
proportional  spacing,  since  the  size  of  the  area  containing 
the  character  is  not  predictable.  Single-font  and  most  multi- 
font readers  deal  with  fixed  pitch  fonts  (fixed  spacing)  rather 
than  proportional  spacing  and  have  a limit  on  allowable  type 
sizes  in  order  to  provide  enough  decision  points  to  accurately 
identify  the  character. 

Reference  5 reflects  some  problems  experienced 
by  a military  organization.  The  article  discusses  the  incor- 
poration of  an  OCR  capability  by  the  U.S.  Marine  Corp.  The 
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keypunching  operation  was  replaced  by  OCR  typewriters  with 
a USASI  type  font  for  data  preparation  prior  to  entry  in  the 
computer.  With  regards  to  OCR  typewriters,  the  author 
emphasizes  that  the  OCR  font  is  only  one  consideration  which 
led  to  the  replacement  of  the  older  nonelectric  typewriters. 
Ribbons  and  embossing  density  are  critical.  Reusable  cloth 
ribbons  "splash"  when  struck  by  the  key,  and  the  scanner  reads 
the  splash  mark  as  part  of  the  character.  Key  pressure  must 
be  accurate  so  that  the  typed  character  is  free  from  shadows 
caused  by  hitting  the  page  too  hard  or  unblackened  areas  in 
the  letters  caused  by  hitting  the  page  too  lightly.  Another 
critical  parameter  is  vertical  alignment.  If  vertical  align- 
ment is  not  maintained  (e.g.,  six  lines  per  inch)  by  the  time 
the  scanner  is  halfway  down  the  page,  the  scanning  aperture 
will  be  skimming  the  top  or  bottom  of  the  characters,  the  line 
will  be  rejected.  The  printer  should  be  one  with  a proven 
record  of  satisfactory  performance  in  preparing  OCR  forms. 

FEEDER  TRANSPORT/OUTPUT  STACKER  - The  function 
of  the  transport  unit  is  to  move  the  forms  from  the  input  hopper 
past  the  scanner  read  station  and  then  to  the  output  stacker. 
This  unit  represents  the  most  critical  and  costly  element  of 
a reader.  Reference  1 states  that  there  are  essentially  four 
types  of  transports  each  associated  with  a different  type  of 
OCR  device.  Transports  used  for  handling  cards  or  small  paper 
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forms  (up  to  4 x 9 inches)  are  associated  with  document  readers  . 
Transports  for  handling  variable  size  forms  from  card  size  to 
11  x 14  inch  paper  are  associated  with  page  readers.  Trans- 
ports for  handling  continuous  paper  rolls  (cash  register  or 
adding  machines  tapes)  are  used  in  journal  type  readers.  Final- 
ly, transports  for  handling  microfilm  are  used  in  microfilm 
readers.  Although  significant  improvements  have  been  made 
in  transport  mechanisms  in  recent  years,  the  ability  to 
properly  feed  and  align  paper  in  the  read  unit  remains  a 
problem  which  manufacturers  are  attempting  to  solve  by 
developing  new  technologies.  The  disadvantage  of  mechanical 
paper  handling  techniques  is  that  they  tend  to  jam  even  when 
high  quality  paper  is  used.  Reference  1 reports  that  most 
OCR  units  employ  one  of  two  types  of  feeder  mechanism.  The 
first  is  a friction  feeder  which  exerts  pressure  on  the  stack 
and  pushes  the  forms  through  a set  of  rollers  that  separates 
the  top  form  from  all  those  below  it.  The  other  type  is  a 
vacuum  feeder  which  lifts  the  top  form  off  the  input  stack. 

The  form  is  then  moved  past  the  scanner  on  a conveyor  belt 
and/or  a set  of  rollers. 

SCANNER  - The  scanner  has  the  function  of  con- 
verting the  printed  information  on  the  document  into  electrical 
signals  that  will  enable  the  recognition  system  to  recognize 
the  printed  characters.  As  noted  previously,  an  OCR  system  can 


24 


fall  into  two  categories:  Page  Readers  which  are  capable  of 

reading  the  entire  page  and  Document  Readers  which  read  only 
a few  lines.  Therefore,  the  kind  of  scanning  system  suitable 
for  an  OCR  depends  on  the  size  of  the  area  to  be  scanned.  The 
most  common  types  of  scanners  used  today  are  the  mechanical 
disc,  flying  spot,  laser,  linear  and  area  solid  state  arrays. 

The  Mechanical  Disc  Scanner  as  depicted  in  Figure  6 
uses  a light  source  which  is  reflected  from  the  source 
being  scanned.  The  reflected  light  is  collected  and 
focused  on  to  a rotating  disc  containing  multiple  apertures. 
The  selective  apertures  in  the  disc  allow  only  a portion 
of  the  area  being  scanned  to  impinge  on  the  photomulti- 
plier at  one  time.  The  document  movement  combined  with 
the  spinning  disc  results  in  a video  output  representative 
of  the  character  shape.  Although  this  method  of  scanning 
has  the  advantage  of  a lower  cost,  it  is  relatively  slow 
in  comparison  to  other  methods. 

The  Flying-Spot  Scanner  is  used  extensively  in  page 
readers.  It  is  an  example  of  selective  illumination  in 
which  the  light  spot  on  th^  CRT  screen  is  focused  on  the 
document  as  indicated  in  Figure  7.  By  deflecting  the  CRT 
beam  in  short  vertical  movements  that  are  displaced  horizon- 
tally to  one  another,  each  character  on  the  line  is  covered 
in  ten  to  twenty  cycles  of  scan.  Figure  8 shows  that  the 
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FIGURE  6 


scanning  system  slices  the  character  many  times  vertically 
in  a TV  raster  like  manner  and  also  divides  each  vertical  scan 
into  cells,  so  that  the  character  is  in  effect  described  as  a 
two-dimensional  matrix  of  black  and  white  cells.  The  scanning 
raster  height  is  typically  larger  than  the  height  of  the 
character  to  allow  for  vertical  misregistration  of  the 
characters.  This  method  of  scan  is  the  most  versatile 
in  terms  of  flexibility.  The  Flying  Spot  Scanner  can  scan 
in  any  pattern  desired,  and  can  jump  from  one  part  of  the 
page  to  another  with  relative  ease.  This  is  important 
when  rescanning  has  to  be  done.  Its  disadvantages  are 
that  it  is  somewhat  expensive  and  bulky,  has  limited 
resolution,  and  suffers  from  geometrical  distortions 
and  defocussing  when  spot  is  deflected  over  large  angles. 


LASER-BEAM  SPINNING  MIRROR  SCANNER 


FIGURE  9 


by  first  passing  the  laser  beam  through  a set  of  optics. 

The  function  of  these  optics  is  to  generate  a spot  of  light 
that  is  uniform  in  intensity.  The  beam  is  then  directed 
onto  a scanning  spinner.  The  spinner  deflects  the  beam 
through  a given  angle  at  a uniform  angular  velocity.  The 
beam  then  passes  through  an  objective  lens  and  a field  flat- 
tening lens.  Thus,  as  the  beam  moves  through  a given  angle, 
a focused  spot  moves  across  the  width  of  the  document. 

During  each  line  scan  interval,  the  document  is  moved  in 
the  direction  of  its  height,  a distance  equal  to  the  width 
of  the  scanning  spot.  In  this  way,  the  entire  document  can 
be  scanned.  As  with  the  flying  spot  scanner,  the  light 
reflected  off  the  document  is  focused  onto  the  face  of  a 
photo  detector  and  the  resulting  signal  fed  into  the  signal 
processing  electronics.  T .a  advantage  of  this  scanning 
method  is  its  high  resolution.  The  disadvantage  is  its  low 
speed  (less  than  the  mechanical  scanner). 

The  Solid  State  Scanner  technique  can  be  designed 
around  either  linear  or  area  arrays.  The  linear  array  as 
indicated  in  Figure  10  uses  a single  vertical  array  of  about 
10  to  50  sensors.  A bright  line  focused  onto  the  document 
reflects  onto  the  sensor  array.  As  a character  moves  through 
the  optical  system,  a timing  circuit  "strobes"  the  character 1 s 
image  as  it  appears  on  the  array.  This  produces  a series  of 
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pulses  that  represent  vertical  segments  of  the  character  and 
the  white  field  around  it.  The  data  for  each  segment  is 
loaded  in  a shift  register.  Therefore  after  the  character 
has  been  scanned,  the  shift  register  is  loaded  with  data 
representing  all  the  segments.  Linear  arrays  generally  break 
a character  down  into  five  to  ten  vertical  segments.  The 
larger  the  number  of  vertical  segments,  the  smaller  the 
chances  of  a misread  character.  However,  using  a larger 
number  of  vertical  segments  to  step  up  reader  accuracy  slows 
down  the  data  entry  time  for  each  character.  So,  OCR  sensor- 
array  systems  have  to  play  off  recognition  accuracy  against 
reading  time.  The  area  array  eliminates  the  speed  and 
accuracy  limitations  of  the  one-dimensional  OCR  system. 

Instead  of  looking  at  a number  of  vertical  segments  of  each 
character,  arrays  look  at  the  entire  character  at  once 
(Figure  11).  State-of-the-art  two-dimensional  photosensor 
arrays  look  at  as  many  as  500  to  1,000  points,  simultaneously, 
in  about  the  same  length  of  time  it  takes  to  read  only  one 
or  two  vertical  segments  in  a linear  array. 

RECOGNITION  UNIT  - The  most  commonly  used  recogni- 
tion methods  are  matrix  matching,  stroke  analysis,  feature 
extraction  and  curve  tracing. 
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In  matrix  matching,  the  electronic  signals  repre- 
senting the  scanned  character  are  stored  in  a shift  register 
that  is  connected  to  a series  of  resistor  matrices.  Each 
resistor  matrix  represents  a different  character.  The  output 
of  each  matrix  is  connected  to  a second  register  whose  voltage 
outputs  are  representati ve  of  what  should  be  obtained  if  the 
referenced  character  were  present.  The  register  voltage 
representing  the  character  to  be  identified  is  compared  to 
the  register  voltage  of  each  resistor  matrix  contained  in  the 
character  set.  The  recognition  circuitry  then  determines 
which  character  is  being  read  by  deciding  which  matrix  pro- 
duced the  maximum  correlation  with  the  "unknown"  character. 

In  the  Stroke  Analysis  Technique,  recognition  is 
based  on  analysis  of  the  strokes  or  line  formation  of  each 
character  in  comparison  with  information  stored  in  the  form 
of  a truth  table  on  all  characters  in  the  set. 

Feature  extraction  involves  the  detection  of  certain 
features  such  as  the  major  shape  elements  (lines,  curves, 
etc.)  via  decision  tree. 

Curve  tracing  involves  the  tracing  of  each  charac- 
ter's outline  curvature  by  movement  of  the  scanner  beam  or 
by  logical  decisions  of  the  recognition  unit.  This  technique 
is  most  suitable  for  the  recognition  of  hand  print  numerics. 
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Optical  Character  Readers  may  be  classified  by  the 
type  of  input  form  processed.  The  document  reader  can  handle 
forms  up  to  4 x 9 'inches  and  read  up  to  five  lines  of  data 
in  fixed  locations  on  a document  with  a single  pass.  A 
numeric  character  set  in  a single  font  is  usually  employed 
in  Document  Readers.  The  Page  Reader  is  capable  of  reading 
numerous  lines  of  data  and  a variety  of  printed  forms  up  to 
8>s  by  11  with  data  in  various  locations  on  the  page.  This 
reader  requires  a scanner  or  a transport  capable  of  precise 
positioning.  The  speed  of  searching  and  scanning  and  the 
capabilities  for  selective  reading  govern  throughput  more  so 
than  the  speed  at  which  the  paper  moves. 

The  two  basic  modes  of  operation  for  OCR  equipment 
are  the  direct  read  mode  and  the  re-type  mode.  When  the 
generation  of  source  data  can  be  closely  controlled,  i.e., 
type  font  used,  quality  of  printing,  size  of  document,  etc., 
the  direct  read  mode  is  used  whereby  source  documents  are  read 
directly  by  OCR  equipment.  When  the  generation  of  source 
data  cannot  be  controlled,  the  retype  mode  is  used  whereby 
some  or  all  of  the  source  data  are  retyped  on  standard  pages 
using  the  appropriate  type  font. 

D.  MIXED-MEDIA: 

This  system  consists  of  an  OCR  system  operating  in 
conjunction  with  a key-to-disc  unit.  The  key-to-disc  portion 


is  used  to  correct  optical  reading  errors.  It  is  a standard 
commercially  available  system  with  special  software  for  com- 
bining OCR  and  keyboard  data.  Most  mixed  media  systems  let 
the  OCR  portion  of  the  system  read  input  data.  The  key-to- 
di sc  portion  of  the  system  handles  rejects  from  the  OCR  pass 
or  source  documents  recorded  in  non-OCR  readable  font.  The 
combining  of  keyboard-to-di sc  and  optical  readers  into  one 
data  entry  system  represents  the  proper  evolution  in  design 
of  a complete  data  entry  system.  Such  a system  permits  the 
user  to  re-enter  via  keyboard  all  OCR  unreadable  documents 
thus  reducing  significantly  the  total  number  of  rejects. 

E.  VOICE  INPUT: 

Voice  data  entry  permits  the  operator  to  converse 
with  the  computer  in  his  most  natural  form  of  communication  - 
voice.  Reference  6 states  that  in  a previous  report,  RADC 
conducted  a study  to  compare  voice  input  with  keyboard  and 
graph  light  pen  methods  of  data  entry.  The  results  indicated 
that  for  certain  types  of  applications,  voice  input  offers  con- 
siderable advantages  over  other  means  of  data  entry.  In  data 
entry  tasks  where  the  operators'  hands/eyes  are  busy,  voice 
Input  allow  the  operators  to  concentrate  on  that  task  instead 
of  having  their  attention  diverted  by  a keyboard  or  CRT  terminal. 
Also,  In  some  applications  where  operator  movement  is  crucial, 
voice  Input  provides  the  operator  freedom  of  movement  as  the 


task  requires.  This  reference  also  states  that  although 
there  are  considerable  advantages  of  entering  data  by  voice, 
limitations  are  imposed  by  such  factors  as  speed,  accuracy, 
environmental  noise,  operator  experience  and  degree  of  hand/ 
eye  occupation. 

Most  of  the  work  done  to  date  on  voice  data  entry  has 
been  on  the  highly  restricted  form  of  spoken  input  known  as 
"isolated  words".  Single  words  are  spoken  in  isolation  with 
distinct  pauses  before  and  after  each  word  which  act  as  bound- 
ary markers  to  show  where  a word  begins  and  ends.  Then  the 
pattern  of  speech  wave  between  those  time  markers  can  be 
analyzed  without  having  to  consider  the  effects  of  surround- 
ing words.  Thus,  machine  recognition  of  isolated  spoken 
words  is  similar  to  human  recognition  of  typed  or  handwritten 
words  separated  by  clear  spaces.  Machine  recognition  of 
continuous  speech  is  more  complex,  and  is  analogous  to  human 
recognition  of  typed  or  handwritten  sentences  without  having 
the  appropriate  spaces  between  words. 

Just  as  different  individuals  have  different  hand- 
writing, different  speakers  have  different  "voice  signatures," 
such  that  from  the  standpoint  of  the  machine  the  same  words 
are  expressed  differently  by  different  speakers.  The  equip- 
ment may  be  designed  to  handle  only  one  speaker's  voice, 
or  to  adapt  to  many  voices;  thus,  a distinction  is  made 
between  "speaker- dependent"  and  "speaker-independent"  systems. 


Reference  6 states  that  most  of  the  automatic  speech  recogni- 
tion systems  in  operation  are  s peaker- dependent  systems.  Each 
speaker  must  "train"  the  system  to  recognize  a particular 
speech  pattern.  Also,  speech  recognition  is  influenced  by 
environmental  noise  among  other  things,  i.e.,  variation  in 
pronunciation  due  to  tired  vocals,  emotions,  etc.  Therefore, 
most  speech  recognizers  have  confined  their  applications  to 
carefully  spoken  isolated  words  in  fairly  quiet  rooms;  thus, 
easing  the  recognition  task. 

Although  much  progress  has  been  made  in  the  field 
of  voice  input  technology,  particularly  continuous  speech 
recognition,  the  need  for  further  major  improvements  is 
required.  The  experts  agree  that  the  problem  of  voice  input 
is  not  entirely  solved.  Where  current  technology  offers 
several  commercial  devices  for  isolated  word  recognition, 
there  i i still  "gaps"  that  must  be  filled  before  speech 
recognizers  will  achieve  their  full  potential  as  tools  for 
conversing  with  machines.  Reference  6 provides  a current 
status  of  the  problems  to  be  solved  in  the  voice  input  tech- 
nology area  before  this  method  of  data  entry  can  be  efficient 
and  effective.  The  problems  are  noted  below: 

1.  Words  must  be  spoken  in  isolation  which  cause 
a considerable  reduction  in  speed  as  well  as  a loss  of  the 
"naturalness"  of  voice  communications.  Also,  the  problem  of 
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connected  speech  recognition  is  extremely  difficult  and  un- 
likely to  be  solved  for  larger  vocabularies  in  the  near  future. 

2.  Intra-speaker  and  inter-speaker  variability  pre- 
sents a major  problem.  Intra-speaker  variability  can  cause 
recognition  rates  to  decrease  if  each  speaker's  reference 
patterns  are  not  updated  periodically.  Experiments  have 
shown  that  voice  characteristics  change  slowly  with  time. 
Inter-speaker  variability  requires  that  every  speaker  establish 
and  use  his  own  reference  patterns.  This  has  a direct  impact 
on  systems  cost. 

3.  Other  problems  include  a variation  in  speech 
pattern  when  an  individual  becomes  annoyed,  operator  training 
and  orientation,  and  a difference  in  speech  pattern  being 
used  during  vocabulary  training  of  the  machine  and  during 
actual  use. 

RADC/IRA  has  a laboratory  director's  fund  effort 
underway.  The  objective  of  this  effort  is  to  investigate 
techniques  of  voice  data  entry  using  Automatic  Speech  Recogni- 
tion (ASR)  equipment  in  conjunction  with  other  automatic  data 
entry  equipment,  such  as  keyboards  and  dynamic  character  pens. 
The  research  will  focus  on  methods  of  combining  rather  than 
comparing  these  automatic  data  entry  aids  to  provide  high  data 
throughput,  low  system  error  rates  and  an  efficient  man-machine 
interface . 
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I V . SOURCE  DATA  AUTOMATION  EQUIPMENT  PROFILE  : 


A source  data  equipment  entry  guide  was  compiled  by  the 
Operations  Research  Division  of  the  Air  Force  Data  Systems 
Design  Center  located  at  Gunter  AF  Station  AL  (Reference  7). 
This  document  represents  the  most  complete  and  comprehensive 
summary  of  the  description  of  source  data  automation  equipment 
discovered  during  the  literature  search  and  data  collection 
phase  of  the  study.  Although  the  report  was  published  approxi- 
mately three  to  four  years  ago,  it  is  felt  that  the  performance 
of  source  data  automation  equipment  has  not  changed  substan- 
tially since  that  time;  perhaps  only  the  prices  require 
adjusting  due  to  inflation. 

The  Data  Entry  Guide  is  intended  to  serve  as  a reference 
for  "state-of-the-art"  Source  Data  Automation  ( SDA)  equipment. 
The  reader  should  be  able  to  use  this  guide  as  the  first  step 
in  selecting  SDA  equipment  (i.e.,  choosing  an  equipment 
category  that  may  offer  a solution  to  his  data  entry  problem). 

The  equipment  profiles  for  the  different  SDA  methods 
contain  descriptions,  characteristics,  typical  applications, 
and  representative  equipment.  They  present  the  basic  infor- 
mation necessary  to  select  equipment  categories  meeting  the 
requirements  of  source  data  entry  application. 

The  Data  Entry  Guide  is  divided  into  three  chapters  - 
Keyed  Devices,  Reader  Devices,  and  Special  Input  Devices. 


Each  chapter  is  further  subdivided  into  equipment  categories 
or  profiles  (such  as  keyboard-to-di sk , optical  character 
readers,  voice  recognition  systems,  etc.)*  Within  the  equip- 
ment profiles,  the  following  information  is  supplied: 

A.  A general  description  of  the  equipment  category 
including  its  definition,  characteri si tcs , and  options  (nor- 
mally available). 

B.  Operator  requirements  normally  associated  with  the 
equi pment . 

C.  A cost  range  for  the  equipment. 

D.  Typical  applications  of  the  equipment. 

E.  Advantages  and  disadvantages  of  the  equipment. 

F.  A list  of  representative  equipment  including  features 
and  speci fications . 

The  SDA  document  warns  that  the  guide  is  not  intended 
to  supply  comprehensive  specifications  for  every  data  entry 
device  (or  system),  but  rather  as  a sample  to  Introduce  the 
reader  to  the  types  of  SDA  equipment  available.  It  further 
cautions  that  when  consideration  is  given  to  implementing  a 
SDA  System,  attention  should  be  given  to  all  equipments  avail- 
able and  not  just  the  ones  in  the  SDA  document.  There  are 
several  reference  sources  (such  as  Auerbach,  Data  Pro  and 
Data  Entry  Today)  which  cover  specific  data  entry  devices 
i n greater  detai 1 . 
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The  SDA  equipment  guide  is  quite  comprehensive.  It  is 
recommended  that  all  potential  procurers  of  SDA  equipment  obtain 
a copy  of  the  guide  from  AFDSDC/DMB,  Gunter  AFS  AL. 

V.  ECONOMIC  CONSIDERATIONS: 

A sizeable  portion  of  the  cost  of  a typical  data  process- 
ing operation  is  contained  in  the  data  entry  area.  References 
1 and  2 indicate  that  the  range  of  operating  cost  consumed  in 
the  data  entry  process  can  vary  from  25  to  30  percent  on  the 
low  end,  and  may  go  as  high  as  40  to  50  percent  depending  upon 
the  specific  application  and  system  configurations.  These 
estimates  pertain  to  data  processing  installations  in  which 
the  data  is  input  via  keyboard  thus  requiring  a large  number 

of  operators  and  the  associated  labor  cost.  For  keyboard  entry 
methods,  the  operating  cost  is  directly  proportional  to  the  data 
volume  to  be  processed.  For  OCR  methods,  the  rel ationshi  p between 
operating  cost  and  volume  is  not  directly  proportional  to  volume. 
Nevertheless,  the  basis  for  determining  the  cost  effectiveness 
of  a particular  data  entry  method  lies  primarily  in  the  volume 
of  data  to  be  input.  As  noted  previously  in  the  report,  OCR 
systems  are  assuming  a greater  role  in  the  data  entry  process 
whereas  voice  input  is  still  largely  developmental.  Consequently, 
only  keyboard  entry  and  OCR  techniques  will  be  considered 


from  an  economical  standpoint.  In  the  majority  of  cases,  OCR 
systems  are  used  as  a direct  replacement  for  keypunching 
operations.  Whether  or  not  it  would  be  cost-effective  to 
convert  to  OCR  cannot  be  answered  satisfactorily  without  first 
examining  the  specific  application,  present  cost  incurred  in 
data  preparation,  input  accuracy  requirements,  and  more 
importantly  the  volume  requirements.  Simple  cost/performance 
comparison  can  be  made  between  OCR  and  other  methods  of  data 
entry  by  comparing  labor  and  equipment  costs.  Reference  1 
presents  the  following  formula  for  making  the  evaluation: 

F = 15Tc  » where 

F = number  of  character  processed  per  dollar, 
a = total  characters  processed  per  month 
b = monthly  equipment  rental  and  overhead  cost, 
c * monthly  personnel  costs. 

Since  data  entry  system  cost/performance  is  normally  speci- 
fied in  terms  of  cost  per  character  rather  than  character 
per  cost,  the  reciprocal  of  the  value  "F"  in  the  formula 
will  be  used  to  present  the  results  of  the  following  analysis 
which  establishes  an  approximate  relationship  between  the 
cost  per  character  versus  volume  for  keyboard  entry  and  OCR 
systems . 

In  order  t;o  establish  the  above  relationship  for  a typical 
keypunch  operation,  the  following  assumptions  were  made: 
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1.  Operator  Speed  - 5,500  keystrokes  per  hour. 

Reference  7 states  that  most  sources  (e.g.  reference 

10)  rate  the  average  keypunch  operator  at  10,000  keystrokes 
per  hour  on  unbuffered  devices  and  slightly  higher  on  buffered 
devices.  However,  this  speed  does  not  take  into  consideration 
such  factors  as  card  and  document  handling,  coffee  breaks, 
error  correction  time,  etc.  These  factors  restrict  the 
sustained  average  speed  in  a typical  commercial  application  to 
approximately  5,500  keystrokes  per  hour.  Use  of  double  entry 
for  purposes  of  verification  is  not  considered. 

2.  Keypunch  Rental  Fee  - $100  per  month. 

Reference  7 indicates  a range  of  $ 35- $200  per  month 
for  lease  of  an  unbuffered  unit  { $60- $260  for  a buffered  unit). 
Reference  10  which  is  a later  reference  (1977)  indicates  a 
keypunch  rental  fee  of  $90  per  month. 

3.  Number  of  Operators  - Variable. 

The  number  of  operators  required  to  input  the  data  is 
dependent  upon  the  input  volume.  Therefore,  a different  number 
of  operators  will  be  required  and  subsequently  computed  for 
each  input  volume  considered. 

4.  Operator  Salary  - $1,250  per  month. 

This  estimate  includes  fringe  benefits  and  overhead. 

5 . Number  of  Supervisors  - 1 

6.  Supervisor  Salary  - $1,500  per  month. 

This  estimate  also  includes  fringe  benefits  and  overhead. 
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7. 


Hours  per  Day  - 7 
Days  per  Month  - 21 


8. 

9.  Characters  per  Document  - 100 
1 0 . Input  Volume  - Variable 

An  i nput  vol ume  wi  1 1 be  considerd  which  varies  from  1,000 
documents  per  day  to  10,000  documents  per  day  in  increments 
of  1,000  documents  per  day. 

Table  I is  a summary  of  all  the  calculations  for  the 
volume  range  considered.  For  a volume  range  of  1,000  docu- 
ments per  day,  the  following  values  were  computed  as  noted 
below: 

1.  Number  of  Operators  = 1000  Doc/Day  x 100  Char/Doc 

5500  Char/Hr  x 7 Hr/Day 

2.  "a"  = 3 (operators)x5500(Char/Hr}x21(DayS/Mo)x7(Hr/[)ay) 

3.  "b"  = 3 ( stations  ) x^  00/Mo 

4.  "c"  = 3 (operators Jx^1 250/mo+1 (Supv . )x^1 5°0^Mo 


In  order  to  establish  the  relationship  for  a key-to-disc 
operation,  the  following  assumptions  were  made: 

1.  Operator  Speed  - 7150  keystrokes  per  hour. 

Reference  7 states  that  the  sustained  average  speed 
over  a long  period  of  time  for  a key-to-disc  operator  is 
approximately  7. ,150  keystrokes  per  hour. 


TABLE  I - KEYPUNCH  COST/PERFORMANCE  DATA 


2 . Key-to-Disc  Rental  Fee  - 


Reference  7 indicates  the  following  average  lease  cost 
per  month  per  unit  for  shared  processor  systems: 

4 Station  System  - $250.00  (Estimate) 

8 Station  System  - $213.00 

16  Station  System  - $158.00 

The  remaining  assumptions  made  for  the  keypunch  operation 
are  also  applicable  for  the  key-to-disc  operation.  Table  II 
is  a summary  of  the  calculations  made  for  the  key-to-disc 
operation . 

In  order  to  establish  the  relationship  on  OCR  systems, 
the  following  assumptions  were  made: 

1.  OCR  THRUPUT  - 1 ,500  pages  per  hour 

The  thruput  of  a page  reader  is  dependent  upon  the 
number  of  lines  to  be  read  and  the  number  of  fields  within 
each  line.  Therefore,  the  rate  obtained  with  any  OCR  device 
is  a function  of  the  design  of  the  form  and  the  device 
reading  speed  in  characters  per  second.  The  above  estimate 
was  based  on  data  contained  in  Reference  10.  For  both  single 
and  multifont  machines,  the  OCR  thruput  was  given  as  1,330 
pages  per  hour  at  0.8  effectiveness,  and  assuming  200 
characters  per  page.  Since  this  analysis  assumed  only  100 
characters  per  page,  the  above  thruput  estimate  of  1,500 
pages  per  hour  is  considered  to  be  a conservative  estimate. 
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VOLUME  HO.  OF 


In  any  case,  it  is  still  sufficient  to  handle  the  daily 
volume  range  considered  in  this  study. 

2.  OCR  RENTAL  FEE 

The  data  contained  in  Reference  10  is  used  as  typical 
values  for  the  rental  cost  of  single  and  multifont  page  readers 
This  reference  states  that  the  monthly  rental  fee  for  a single 
font  machine  is  $5,700  and  the  monthly  fee  for  a multifont 
machine  is  $16,000. 

The  values  of  the  other  parameters  are  the  same  as  in 
the  previous  cases.  Tables  III  and  IV  contain  the  calcula- 
tions for  the  single  and  multifont  machines  respectively. 

A graph  depicting  a comparison  between  these  methods 
of  data  entry  is  shown  in  Figure  12.  The  results  indicate 
that  for  low  volumes  (approximately  2,000  documents  per  day 
or  less)  the  keypunch  and  key-to-disc  data  entry  methods  are 
more  cost  effective  than  either  of  the  OCR  devices.  As  noted 
in  the  figure,  the  cost  per  character  processed  via  keypunch 
and  key-to-disc  Is  less  than  the  cost  per  character  processed 
on  a single  font  OCR  device  (curve  3)  for  volumes  less  than 

2.000  documents  per  day.  The  figure  also  indicates  that  for 
a multifont  OCR  device  (curve  4),  the  breakeven  volume  when 
compared  with  the  keypunch  approach  Is  approximately  5,000 
documents  per  day.  The  break-even  volume  Is  approximately 

6.000  documents  per  day  when  compared  to  a key-to-disc  approach 


TABLE  III  - OCR  (SINGLE  FONT)  COST/ PERFORMANCE  DATA 


TABLE  IV  - OCR  (MULTIFONT)  COST/ PERFORMANCE  DATA 


" " H:  • 


OST  PER  CHARACTER  (CE 


This  difference  is  reasonable  since  the  key-to-disc  approach 
is  more  efficient  than  the  keypunch  approach.  Consequently, 
a single  font  OCR  device  is  more  cost  effective  than  keyboard 
entry  methods  (particularly  keypunch  and  key-to-disc)  in  appli- 
cations where  the  volume  of  characters  to  be  processed  exceeds 

200.000  characters  per  day  (2,000  documents  per  day  x 100 
characters  per  document).  A multifont  OCR  is  not  cost  effec- 
tive until  the  volume  of  characters  to  be  processed  exceeds 

600.000  characters  per  day  (6,000  documents  per  day  x 100 
characters  per  document).  Figure  13  shows  the  relationship 
between  the  monthly  operating  cost  and  volume  for  the  various 
data  entry  methods.  The  monthly  operating  cost  of  keypunch 
and  key-to-disc  methods  are  directly  proportional  to  the 
volume  processed  with  the  proportionality  constant  being 
slightly  higher  for  the  keypunch  approach  than  for  the  key- 
to-disc  approach.  This  figure  also  ’ndicates  that  the  monthly 
operating  cost  for  both  OCR  devices  is  independent  of  volume 
(at  least  for  the  volume  range  considered).  This  relationship 
would  hold  until  the  volume  requirements  exceed  the  thruput 
capability  of  a single  device.  The  acquisition  of  a second 
device  would  cause  curves  3 and  4 of  Figure  13  to  be  elevated 
to  reflect  the  new  operating  cost.  As  expected,  the  break-even 
points  for  the  monthly  operating  cost  of  the  various  data 
entry  methods  occur  at  the  same  volumes  as  noted  in  the 
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previous  discussion.  The  CRT  terminal  approach  does  not 
appear  on  the  graphs  of  Figures  12  and  13.  It  is  considered 
similar  to  the  key-to-disc  approach  in  that  the  CRT  terminal 
would  replace  the  keyboard  in  a key-to-disc  system.  The  cost 
of  this  system  would  be  somewhat  higher  since  the  hardware 
cost  of  the  terminal  should  be  more  than  the  cost  of  a key- 
board, and  a more  sophisticated  software  package  is  required. 
However,  the  data  keystroking  rate  is  approximately  the  same. 
Therefore,  a curve  reflecting  the  CRT  terminal  approach  would 
fall  somewhere  between  the  keypunch  and  key-to-disc  curves 
on  Figures  12  and  13. 

VI  . SYSTEM  CONCEPTS: 

A discussion  of  Source  Data  Automation  system  concepts 
can  best  be  accomplished  by  a consideration  of  the  following 
factors : 

SOURCE  DOCUMENT  CREATION  - This  process  is  typically 
accomplished  by  an  operator  typing  in  the  required  information 
in  the  appropriate  data  fields  on  pre-designed  forms.  Source 
document  creation  may  also  be  accomplished  (not  necessarily 
cost  effectively)  in  a manner  similar  to  the  process  used  by 
RADC/IS  in  the  computer  generation  of  Project  Management 
( Joh nson-Beers ) forms.  The  software  for  the  computer  genera- 
tion of  the  form  on  an  Anderson- Jacobson  printer  is  pre- 
stored. In  addition,  the  software  required  for  the  identifi- 
cation of  the  various  data  fields  for  keyboard  entry  of  the 


appropriate  data  during  update  is  also  stored  in  the  computer 
system.  This  software  facilitates  the  creation  of  a new  form 
in  addition  to  the  update  of  a selected  form. 

DATA  CAPTURE  LOCATION  - Data  from  the  source  documents 
may  be  captured  at  the  location  where  the  document  was  created, 
or  at  a central  location.  Therefore,  in  the  former  case, 
there  would  be  as  many  data  capture  locations  as  there  are 
source  document  locations,  and  in  the  latter,  only  one  loca- 
tion would  serve  as  the  data  capture  facility. 

SOURCE  DOCUMENT  TRANSFER  - In  cases  where  the  data  cap- 
ture location  is  different  from  the  location  where  the  source 
document  is  created,  the  document  must  be  transferred  to  the 
data  capture  location.  This  may  be  accomplished  via  the  U.S. 
Postal  Service  or  electronic  transmission. 

DATA  TO  BE  CAPTURED  - The  type  of  data  typically  cap- 
tured is  source  document  index  data.  This  data  forms  the 
basis  of  a digital  index  file  which  is  used  by  the  computer 
system  to  control  the  input  and  retrieval  of  all  source 
documents  in  the  document  file. 

DATA  CAPTURE  TECHNIQUES  - Data  from  source  documents  may 
be  captured  either  manually  or  automatically.  Manual  data 
capture  typically  involves  an  operator  inputting  system  data 
via  a keyboard.  Automatic  methods  of  data  capture  from 
source  documents  include  the  optical  character  recognition 
approach  and  voice  input. 


The  existing  AFMPC  System  will  be  described  in  order  to 
provide  a basis  for  comparison  with  other  source  data  auto- 
mation system  concepts.  Figure  14  depicts  the  concept  of  the 
existing  system.  Source  (paper)  documents  are  created  at  each 
of  the  122  Consolidated  Base  Personnel  Offices  (CBPOs)  in 
addition  to  other  locations  such  as  recruiting  centers.  The 
documents  are  created  by  an  operator  who  types  in  the  required 
information  in  the  appropriate  data  fields  of  numerous  pre- 
designed forms  with  a standard  typewriter.  Reference  8 indi- 
cates the  257  different  document  types  are  authorized  for 
entry  into  the  microform  personnel  record.  The  amount  of  data 

typed  on  each  document  is  a function  of  the  purpose  served  by 
the  document. 

The  data  is  captured  from  the  source  documents  at  the 
File  Maintenance  Facility  located  at  AFMPC,  Randolph  AFB  TX. 
Therefore,  the  documents  created  at  the  CBPOs  are  trans- 
ferred to  the  AFMPC  for  processing  via  the  US  mail  delivery 
service.  The  current  estimate  is  that  approximately  6,000 
documents  arrive  daily  at  the  AFMPC  for  updating  the  master 
records  of  extended  active  duty  personnel  and  creating  record 
files  for  new  accessions. 

The  data  captured  at  the  File  Maintenance  Facility  is 
the  microform  index  data.  This  data  is  used  to  maintain  an 
index  file  which  is  used  to  facilitate  the  search  and  re- 
trieval of  microfiche  located  in  the  microfiche  file. 
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CONCEPT  OF  EXISTING  SYSTEM 


FIGURE  14 


The  data  is  captured  via  keyboard  with  a group  of 
operators  working  in  an  on-line  environment.  CRT  terminals 
are  used  to  enter  the  index  data  into  the  system.  Seven 
operators  currently  perform  the  indexing  function  and  they 
have  demonstrated  their  ability  to  handle  the  daily  document 
volume  during  a single  shift. 

In  summary,  the  microfiche  file  update  process  at  the 
AFMPC  File  Maintenance  Facility  involves  receiving  update 
documents  created  at  the  various  CBPOs  and  forwarded  by  U.S. 
Mail  Delivery  Service,  extracting  and  entering  the  microform 
index  data  via  a keyboard-to-storage  technique  which  utilizes 
the  CRT  terminal,  photographically  recording  the  new  document 
on  film  and  automatically  mounting  the  new  document  image  on 
the  appropriate  fiche. 

Four  System  Concepts  are  discussed.  They  represent  a 
range  of  views.  The  first  two  concepts  consider  the  automation 
of  selected  functions  in  the  current  system.  Concept  1 dis- 
cusses the  document  transfer  function  from  the  CBPOs  to  AFMPC 
and  Concept  2 discusses  index  data  capture  at  AFMPC.  Although 
discussed  as  separate  concepts,  these  two  functions  could  have 
been  combined  and  presented  as  a single  concept.  The  last  two 
concepts  represent  a more  radical  view  in  that  the  field  paper 
record  is  eliminated.  In  Concept  3,  a coded  digital  copy  is 
substituted  in  its  place.  However,  in  Concept  4,  the  field 
record  does  not  exist  in  any  form  and  the  data  is  provided 
by  AFMPC. 
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CONCEPT  1 is  depicted  in  Figure  15.  It  considers  incor- 
porating into  the  current  system  a higher  degree  of  automation 
for  the  document  transfer  function  from  the  CBPOs  to  AFMPC. 

The  concept  involves  the  creation  of  source  documents  at  the 
CBPOs  in  the  same  manner  as  is  currently  done,  the  use  of  a 
document  facsimile  scanner  to  transmit  the  documents  electron- 
ically to  the  AFMPC  and  a facsimile  printer  at  the  AFMPC  for 
hard  copy  reproduction.  The  resulting  paper  documents  would  be 
processed  as  in  the  current  system.  This  concept  would  elimi- 
nate the  postal  service  time  lag  in  delivery  of  the  documents 
to  AFMPC  from  the  various  CBPOs  . However,  there  are  two  signi- 
ficant disadvantages  associated  with  this  concept.  First,  a 
significant  cost  in  data  communications  and  equipment  is 
incurred,  and  second,  the  legibility  of  the  small  characters 
on  the  forms  will  be  marginal  on  the  hard  copy  reproduction. 

An  estimate  of  the  data  communications  cost  incurred  in 
this  concept  may  be  established  from  the  volume  of  documents 
to  be  transmitted  and  the  cost  of  the  transmission  link.  The 
exact  number  of  documents  originating  from  the  122  CBPOs  is 
not  known.  However,  the  current  estimate  is  that  approximate- 
ly 6,000  documents  arrive  daily  at  the  AFMPC  to  be  used  for 
updating  the  master  personnel  records  of  the  extended  active 
duty  personnel  and  create  record  files  for  new  accessions. 

This  results  in  a daily  average  of  approximately  50  documents 


originating  from  each  CBPO  (a  small  percentage  of  the  6,000 
documents  originating  frr.‘  other  sources  such  as  recruiting 
offices,  etc.).  This  represents  an  electronic  traffic  of 
100  images  (2  images  per  document)  per  day  per  CBPO.  Based 
on  Figure  16  which  was  taken  from  a previous  study  conducted 
by  the  author  (Reference  9),  a 9.6  KBPS  line  can  transmit 
100  images  per  day  in  four  hours  of  actual  transmission  time. 
This  graph  presupposes  that  the  100  images  have  been  scanned, 
compressed,  stored  as  a contiguous  block  in  a digital  buffer, 
and  transmitted  without  interruption.  Considering  a few 
seconds  to  load  and  unload  each  document,  the  total  time  is 
extended  by  only  three  minutes.  Using  a conversion  factor  of 
21  days  per  month,  the  total  hours  of  transmission  time 
required  per  month  is  approximately  85  hours.  Figure  17 
which  was  also  taken  from  Reference  9 indicates  for  a trans- 
mission time  of  85  hours  per  month,  it  is  more  cost  effective 
to  lease  a communications  line.  For  purposes  of  estimating 
the  monthly  lease  cost  for  a 1 22- CBPO  network,  an  average 
distance  of  1 ,000  mil'es  between  a CBPO  and  the  AFMPC  was  used. 
Figure  18  (from  Reference  9)  indicates  that  the  monthly  lease 
cost  for  a 9.6  KBPS  line  at  that  distance  is  $1,300.  There- 
fore, the  monthly  cost  for  122  CBPOs  would  be  $158,600.  For 
comparative  purposes,  an  estimate  of  the  cost  of  postal 
delivery  service  can  be  established  based  on  the  current  rates. 
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U.S.  Postal  rates  for  first  class  mail  are  $0.15  for  the 
first  ounce  and  $0.13  for  each  additional  ounce.  Therefore, 
the  cost  (in  dollars)  of  the  postal  service  may  be  computed 
from  the  relationship: 

C = 0.15  + 0.13  (n  - 1)  where  "n"  equals  the  number 

of  ounces 

A document  is  estimated  to  weigh  0.2  ounces.  Therefore, 
the  estimated  daily  cost  of  postal  delivery  service  for  each 
CBP0  to  mail  50  documents  is  $1.32.  Therefore,  using  21  days 
per  month,  the  monthly  cost  for  all  122  CBPOs  is  $3,382.00. 

The  equipment  cost  estimate  involves  the  cost  of  a 
document  facsimile  scanner  at  all  the  CBPOs,  and  the  cost  of 
document  facsimile  printers  at  the  AFMPC.  A cost  estimate 
of  $21,000  was  used  for  the  document  scanner.  This  estimate 
was  based  on  the  price  of  a typical  document  scanner,  e.g., 
model  DDS  240  (240  lines  per  inch  resolution),  manufactured 
by  Dest  Data  Corporation  of  Sunnyvale,  CA.  Therefore,  the 
total  cost  estimate  for  122  CBPOs  is  approximately  $2.5 
million.  The  estimate  for  the  purchase  price  for  a facsimile 
printer  is  $10,000  (based  on  GSA  data  on  the  DAC0M  printer.) 

A facsimile  printer  requires  90  to  120  seconds  to  produce 
hard  copy  document.  Assuming  a production  requirement  of 
6,000  documents  per  day,  25  printers  will  be  required  at 
AFMPC  to  meet  the  daily  workload.  This  results  in  a cost 


of  $250,000.00.  Therefore,  the  total  equipment  cost  estimate 
is  approximately  $2.75  million. 

The  legibility  deficiency  associated  with  hard  copy  pro- 
duction was  cited  in  the  above  mentioned  study  (Reference  9). 
The  commercial  facsimile  printers  available  are  adequate  for 
reproducing  typewritten  letters  where  the  character  point 
size  is  in  the  neighborhood  of  eight  points.  However,  for 
character  sizes  in  the  neighborhood  of  three  to  four  point 
type,  which  is  characteristic  of  the  field  identifiers  on 
Air  Force  documents,  the  legibility  of  the  hard  copy  products 
is  at  best  marginal. 

CONCEPT  2 is  depicted  in  Figure  19.  It  considers 
incorporating  a higher  degree  of  automation  for  microform 
index  data  capture  at  the  AFMPC  File  Maintenance  Facility 
at  Randolph  AFB  TX.  This  concept  involves  the  creation  of 
source  documents  at  the  CBPOs  utilizing  an  OCR  typewriter 
with  a specialized  OCR  font,  the  transfer  of  those  documents 
to  the  File  Maintenance  Facility  at  AFMPC  via  U.S.  Postal 
Service,  and  the  capture  of  microform  index  data  via  an  OCR 
reader.  This  concept  results  in  a minimum  impact  on  the 
current  system.  A potential  reduction  in  the  number  of 
operators  would  result  since  one  OCR  device  is  more  than 
capable  of  handling  the  daily  document  volume.  However,  a 
major  disadvantage  associated  with  OCR  technology  is  the 
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strict  requirements  and  controls  on  forms,  character  set  and 
font.  A forms  redesign  with  a specialized  OCR  font  to  facili- 
tate the  forms  identification  function,  as  well  as  a redesign 
of  the  data  fields  to  facilitate  the  OCR  scanning  function 
would  be  required  in  order  to  incorporate  this  technology. 

The  quality  of  input  is  critical  to  the  success  of  the  OCR 
device.  Also,  the  restrictions  on  font,  forms  design  and  the 
document  printer  are  necessary  to  obtain  an  acceptable  thru- 
put  with  a minimum  of  rejects  due  to  "no  reads,"  and  more 
importantly,  a minimum  of  substitute  errors.  The  greater 
the  extent  to  which  the  quality  conditions  and  other  restric- 
tions are  met,  the  simpler  is  the  machine  required.  There- 
fore, a direct  relationship  exist  between  forms  control  and 
the  price  of  an  OCR  device.  Recall  that  in  a previous  section 
(Economic  Considerations),  it  was  stated  that  the  monthly 
rental  fee  on  a typical  single  font  page  reader  was  considered 
to  be  about  $5,700.00.  The  author  of  Reference  10  from  which 
this  figure  was  taken  is  a representative  of  Recognition 
Equipment  Inc.  A recent  telecon  to  Recognition  Equipment  Inc. 
(a  leading  manufacturer  of  OCR  equipment)  revealed  that  the 
purchase  price  of  their  single  font  page  reader,  model  S80/88 
is  $235,300.00.  This  price  represents  a full-up  configuration 
which  includes  two  tape  drives,  a printer,  a CRT  terminal  for 
re-entry  of  rejected  data,  a control  computer,  and  a page 


numbering  device.  The  above  figure  does  not  include  mainte- 
nance. The  monthly  fee  for  maintenance  would  be  $2,933.00. 
This  telecon  also  revealed  that  the  monthly  rental  fee  for  a 
three-year  lease  period  ranges  from  $4,000  to  $7,000  depend- 
ing upon  the  equipment  configuration.  The  five-year  lease 
rental  fee  range  is  $3,200  to  $6,500.  The  monthly  rental 
fee  includes  the  -maintenance  cost.  The  cross-over  time  for 
determining  the  cost  effectiveness  of  an  equipment  purchase 
versus  an  equipment  rental  can  be  estimated  by  comparing  the 
purchase  price  and  monthly  maintenance  fee  with  the  monthly 
rental  fee.  Based  on  an  initial  purchase  price  of  $235,300 
along  with  a monthly  maintenance  fee  of  $2,933,  the  cross- 
over time  for  that  particular  unit  would  be  seven  years  when 
compared  to  a monthly  rental  fee  of  $5,700.  Therefore,  for 
an  equipment  life  cycle  of  seven  years  or  less,  it  appears 
that  equipment  rental  is  the  more  cost  effective  approach. 

If  that  time  is  significantly  exceeded  (e.g.,  10  years  or 
more),  consideration  should  be  given  to  equipment  purchase. 

CONCEPT  3 is  depicted  in  Figure  20.  This  concept  repre- 
sents a radical  departure  from  the  current  system  in  that 
the  paper  document  is  eliminated,  and  a coded  digital  version 
of  the  field  record  is  retained  at  each  field  location  for 
all  assigned  personnel.  It  involves  the  use  of  a CRT  terminal 
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for  "source  document  creation"  at  the  CBPGs.  All  the  back- 
ground forms  would  be  prestored  in  the  system's  mass  storage 
unit.  The  operator,  in  conjunction  with  the  CRT  keyboard, 
would  retrieve  a specific  background  form  and  insert  the 
appropriate  information  in  the  respective  data  fields.  The 
data  inserted  in  the  data  fields  along  with  the  data  necessary 
to  identify  the  form  would  then  be  transmitted  electronically 
to  the  AFMPC.  The  output  product  at  the  AFMPC  would  be  a 
microimage  of  all  the  transmitted  images  generated  on  computer- 
output-microfilm  equipment. 

The  communications  cost  estimate  for  this  concept  was 
determined  to  be  approximately  40  percent  of  the  cost  when  trans- 
mitting in  a facsimile  mode  as  in  Concept  1.  Alphanumeric 
as  opposed  to  facsimile  transmission  is  employed.  Unlike 
facsimile  transmission,  alphanumeric  transmission  is  the 
transmission  of  the  coded  digital  representation  of  the 
characters  contained  in  the  message.  In  facsimile  trans- 
mission, the  bit  stream  has  no  coded  digital  significance 
but  rather  each  bit  corresponds  to  a particular  picture 
element  in  the  original  document.  Alphanumeric  transmission 
of  alphanumeric  data  is  significantly  more  efficient  than 
facsimile  transmission  of  alphanumeric  data.  For  example. 
Reference  9 indicates  that  approximately  one  million  bits 
are  required  to  digitize  a typical  business  document  for 


transmission  in  a facsimile  mode.  Assuming  the  document 
contains  about  300  characters  and  the  digital  code  for  each 
character  consists  of  eight  bits,  the  total  number  of  bits 
in  the  coded  digital  representation  of  the  data  is  2,400 
bits.  Therefore,  alphanumeric  transmission  of  alphanumeric 
data  is  several  orders  of  magnitude  more  efficient  than 
facsimile  transmission.  However,  for  graphic  data,  pictorial 
data,  or  handwritten  data  (such  as  a person's  signature), 
facsimile  transmission  is  the  more  efficient  if  not  the 
only  transmission  mode  to  be  used. 

The  estimate  of  the  data  communications  cost  was 
established  based  on  the  quantity  of  bits  to  be  transmitted. 

It  is  estimated  that  a CRT  operator  inputs  an  average  of  80 

to  100  characters  per  screen  (document  image).  This  corresponds 
to  the  data  which  is  typed  on  the  predesigned  documents  in 

the  current  system.  Therefore,  for  100  document  images  (50 
documents  front  and  back),  the  total  number  of  bits  to  be 
transmitted  per  day  on  an  average  from  each  CBPO  is  80,000 
bits.  The  background  data  on  the  CRT  screen  which  corresponds 
to  the  fixed  data  on  the  predesigned  documents  such  as  field 
Identifiers  is  not  transmitted.  A low  speed  voice  grade 
line  (1200  bits  per  second)  will  suffice  in  meeting  this 
daily  traffic.  Approximately  1.1  hours  per  day  or  23  hours 
per  month  (using  a conversion  factor  of  21  days  per  month) 
is  required  to  meet  the  data  load.  Figure  17  indicates  that 
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for  a distance  of  300  miles  or  greater  and  a total  trans- 
mission time  of  23  hours  per  month,  it  is  more  cost  effective 
to  use  the  switched  network  than  the  leased  lines.  Figure 
21  which  was  also  taken  from  Reference  9 indicates  that  for 
a distance  of  approximately  1,000  miles  and  a daily  trans- 
mission time  of  1.1  hours,  the  cost  for  data  communications 
is  approximately  $25.00  per  clcy  or  $525.00  per  month  for 
each  CBP0  or  $64,050.00  for  all  122  CBPOs. 

A reasonable  cost  estimate  for  the  equipment  could  not 
be  established.  However,  indications  are  that  the  cost  will 
be  significant.  Factors  which  contributed  to  this  situation 
are  that  the  extensive  software  design  cost  is  difficult  to 
estimate  based  on  a concept  rather  than  a detailed  system 
design  and  the  mass  storage  unit  required  at  each  field  loca- 
tion is  estimated  to  cost  at  least  $50,000.00. 

In  this  concept,  the  "source  document  creation"  function 

represents  a sizeable  software  task.  To  produce  a "software 
copy"  of  the  multiplicity  of  complex  forms  currently  used  in 
the  military  personnel  record  for  purposes  of  CRT  terminal 
data  entry  will  require  extensive  software  development.  It 
would  appear  that  in  order  for  this  concept  to  be  viable,  a 
type  of  forms  re-design  would  also  be  required  as  in  the 
case  of  Concept  2,  The  objective  of  this  forms  re-design 
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effort  would  be  to  reduce  significantly  the  number  of  differ- 
ent types  of  document  forms  used  and  to  simplify  the  docu- 
ment design.  This  would  reduce  the  number  and  complexity  of 
the  different  types  of  formatted  screens  that  would  have  tc 
be  generated  for  operator  entry  of  selected  data.  One  of  the 
primary  task  performed  at  the  CBPOs  is  the  "records  review" 
function.  Therefore,  in  addition  to  the  software  required 
for  source  document  creation,  software  must  also  be  developed 
to  provide  the  user  the  capability  for  conducting  records 
reviews.  Software  development  and/or  modifications  will  also 
be  required  at  the  AFMPC  to  perform  the  automatic  index  data 
capture  as  well  as  other  functions  required  to  implement  the 
concept. 

The  size  estimate  for  the  mass  storage  requirement  at 
each  CBPO  was  determined  from  data  provided  in  Reference  11. 
The  number  of  records  maintained  at  the  122  CBPOs  is  569,210 
thus  producing  an  average  of  4,666  records  per  CBPO.  Using 
a weighted  average  of  four  fiche  per  record  for  officers  and 
airmen,  and  a weighted  average  of  13  images  (pages)  per  fiche 
for  both  officers,  and  airmen,  an  average  of  150  characters 
per  image  and  eight  bits  per  character,  the  mass  storage 
requirement  estimate  was  0.29  billion  bits  per  CBPO. 

Recall  that  the  two  primary  activities  performed  at  the 
CBPOs  are  source  document  creation  and  records  review.  The 
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primary  activity  at  the  MAJCOMS  is  records  review.  There- 
fore, the  MAJCOMS  were  not  considered  in  the  previous  esti- 
mates on  up-date  document  flow  since  the  bulk  of  the 
documents  originate  at  the  CBPOs.  However,  in  discussing 
a concept  in  which  the  field  record  (paper  or  film)  is  elimi- 
nated and  a coded  digital  version  is  substituted  in  its 
place,  the  mass  storage  requirement  at  the  MAJCOMS  must  also 
be  considered.  The  number  of  records  maintained  at  the  26 
major  commands  (MAJCOMS)  is  95,753  thus  producing  an  average 
of  3,683  records  per  MAJCOM.  Using  the  same  conversion  factors, 
the  mass  storage  requirement  at  each  MAJCOM  was  0.23  billion 
bits.  These  mass  storage  units  are  expected  to  cost  at  least 
$50,900  each.  This  estimate  was  based  on  the  Digital  Equip- 
ment Corporation  System  SM-30HHA-LA.  This  system  consist  of 
a dual  RK07  cartridge  disk-based  PDP  1 1 34A  real  time  package. 

It  includes  the  RSC-11  multi-user  operating  system,  and  a 
RK07  drive  with  a 28  megabyte  (0.22  billion  bits)  storage 
capacity.  Hardware  maintenance  cost  for  this  system  is  $441.00 
per  month.  Software  support  for  this  and  other  DEC  Systems 
is  provided  on  a per-call  basis  ($50  per  hour,  plus  expenses), 
a weekly  basis  ($1,800  per  week,  plus  expenses),  for  a six- 
month  period  ($5,500  per  month)  and  for  a 12-month  period 
($5,200  per  month).  A software  support  cost  for  at  least 
one  month  will  be  included  as  part  of  the  system  costs  ($7,200). 


The  CRT  terminal  is  estimated  to  cost  $15,000.  It  must 
be  a high  resolution  device  capable  of  displaying  a full  page 
document.  A three- termi nal  configuration  will  be  required  at 
the  CBPOs  to  support  both  the  source  document  creation  and 
records  review  functions.  A two-terminal  configuration  shou’d 
suffice  at  the  MAJCOMS  for  records  review. 

An  estimate  of  $137,500  was  used  as  the  cost  of  a com- 
puter-output-microfilm system.  This  estimate  was  based  on  the 
price  of  the  Model  715  COM  System  manufactured  by  the  3M 
Company . 

In  summary,  this  concept  involves  each  CBP0  and  MAJC0M 
retaining  a coded  digital  record  of  all  assigned  personnel  to 
facilitate  the  records  review  activity.  It  also  involves  the 
creation  of  source  documents  via  CRT  terminal  at  the  CBPOs, 
the  electronic  transmission  of  those  documents  in  an  alpha- 
numeric mode  to  the  AFMPC  for  file  update  and  the  generation 
of  a film  strip  of  the  update  images  via  COM. 

CONCEPT  4 which  is  depicted  in  Figure  22  indicates  the 
relationship  between  this  study  and  a previous  study  entitled, 
"Long  Range  Microimage  Transmission  Techniques  Study  for  AFMPC," 
{Reference  9).  In  that  study,  a similar  concept  was  considered 
In  which  the  paper  copies  of  the  master  record  at  the  26  MAJCOMS 
were  eliminated.  Further,  no  other  form  of  the  record  was  sub- 
stituted in  its  place.  CRT  terminals  were  used  to  facilitate 
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the  records  review  and  other  user  activity  which  involved 
the  requests  of  data  contained  on  microfiche  from  AFMPC.  The 
requested  data  was  provided  to  the  26  MAJCOMS  via  electronic 
transmission  in  a facsimile  mode.  Therefore,  Concept  4 repre- 
sents an  extension  of  the  concept  discussed  in  the  Long  Range 
Microimage  Transmission  Techniques  (LRMTT)  Study.  In  this 
concept,  data  from  AFMPC  is  provided  to  the  122  CBPOs  in 
addition  to  the  26  MAJCOMS.  One  of  the  conclusions  reached 
in  that  study  was  that  the  data  communications  cost  between 
the  26  MAJCOMS  and  the  AFMPC  would  be  a significant  factor  in 
the  recurring  cost  of  the  system.  Consequently,  the  data 
communications  cost  in  this  concept  will  be  exorbitant. 

Although  the  122  CBPOs  were  not  considered  in  the  LRMTT  Study, 
sufficient  data  was  provided  to  estimate  the  data  communications 
cost  for  a network  which  includes  the  122  CBPOs  in  addition  to 
the  26  MAJCOMS.  Specifically,  an  estimate  of  the  image  traffic 
in  support  of  the  records  review  and  other  user  activity  was 
established  for  the  CBPOs  (as  well  as  the  MAJCOMS)  in  the 
Operational  Requirements  Section.  The  estimated  average  image 
traffic  in  response  to  user  equests  was  2,103  images  per  day 
for  each  CBPO.  This  consisted  of  730  images  per  day  of  on- 
line traffic  (corresponding  to  requests  for  which  an  immediate 
response  is  required)  and  1,373  images  per  day  of  off-line 
traffic  (corresponding  to  requests  for  which  an  immediate 


response  is  not  required).  The  analysis  conducted  for  the 
MAJCOMS  in  the  LRMTT  Study  indicated  that  the  user  requests 
at  each  MAJCOM  resulted  in  an  image  traffic  of  835  images 
per  day  (290  on-line  and  545  off-line).  In  order  to  satisfy 
these  requests  during  a single  shift  operation,  a 230.4  KBPS 
data  link  was  required.  Therefore,  the  on-line  traffic  por- 
tion (730  images  per  day)  of  the  CBPO  image  traffic  will 
require  a similar  link.  Figure  18  indicates  that  the  monthly 
lease  cost  for  a 230.4  KBPS  link  transmitting  over  an  average 
distance  of  1,000  miles  is  approximately  $35,000.00.  Since 
the  off-line  requests  do  not  require  a >.  immediate  response, 
they  may  be  satisfied  via  the  U.S.  Postal  Service.  The  postal 
estimate  for  the  off-line  traffic  was  approximately  $95.00  per 
month  per  CBPO.  The  estimate  was  based  on  the  analysis  used 
in  the  LRMTT  Study  for  the  MAJCOM  off-line  traffic  along  with 
the  CBPO  user  request  data  provided  in  Reference  11.  In  the 
LRMTT  Study,  the  monthly  postal  cost  to  satisfy  241  off-line 
requests  per  month  at  each  MAJCOM  was  approximately  $38.00. 

The  off-line  traffic  for  each  of  the  122  CBPOs  corresponds 
to  604  requests  per  month.  Therefore,  the  monthly  postal 
cost  for  the  CBPO  off-line  requests  will  be  approximately 
$95.00.  The  total  monthly  cost  of  data  communications  via 
electronic  transmission  and  the  postal  service  for  this 
concept  is  approximately  $36,000  for  each  of  the  122  CBPOs. 
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Whereas  the  230.4  KBPS  data  link  was  required  to  satisfy  the 
on-line  requests  during  a single  shift,  a double  shift  opera- 
tion would  permit  the  user  of  a lower  speed  data  link  which 
would  be  less  expensive.  Figure  16  indicates  that  50  KBPS 
data  link  can  transmit  the  on-line  CBPO  traffic  in  a conti- 
guous block  in  approximately  six  hours.  However,  as  noted 
in  the  LRMTT  Study,  'time  will  be  required  to  retrieve  the 
appropriate  fiche  from  the  file,  load,  and  unload  the  scanner 
and  scan  the  fiche.  This  operation  will  exceed  a single  shift. 
Figure  18  indicates  that  the  monthly  lease  cost  for  a 50  KBPS 
data  link  transmitting  a distance  of  1,000  miles  is  approxi- 
mately $7,000.00  per  month.  Therefore,  the  monthly  cost  of 
data  communications  using  a 50  KBPS  data  link  would  be  approxi- 
mately $8,000  per  CBPO  (postal  cost  included). 

In  the  LRMTT  Study,  cost  estimates  were  also  established 
for  the  equipment  and  manpower  impact  at  AFMPC  to  implement 
a capability  for  responding  to  user  requests  from  the  26  MAJCOMS. 
It  becomes  apparent  that  the  cost  estimates  for  the  122  CBPO 
network  will  be  extreme.  Due  to  the  excessive  cost  indicated, 
this  concept  does  not  warrant  any  further  discussion. 

Table  V is  a summary  of  the  monthly  costs  for  the  different 
concepts.  All  purchase  costs  have  been  distributed  over  an 
assumed  equipment  life  span  of  10  years  in  order  to  provide  a 
common  reference  for  comparison  with  monthly  rental  costs. 
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VII.  SUMMARY  AND  CONCLUSIONS: 

There  are  numerous  data  entry  devices  and  techniques 
that  are  provided  by  the  current  state-of-the-art  in  source 
data  automation  (SDA).  These  techniques  range  from  the  tried 
and  proven  method  of  operator  keystroking  of  data  via  key- 
board; to  the  optical  reader  devices  which  are  currently 
assuming  a greater  share  of  the  data  input  load,  and  voice 
input  technology  which  is  largely  developmental  and  whose 
application  is  presently  limited. 

The  selection  of  the  most  cost-effective  source  data 
automation  method  is  accomplished  primarily  by  examining 
the  volume  of  data  to  be  processed.  For  example,  when  con- 
sidering the  selection  of  a key-to-disc  versus  an  OCR  approach, 
the  volume  of  data  to  be  processed  in  conjunction  with  the 
number  of  fonts  to  be  read  determine  which  is  the  more  cost- 
effective  approach.  For  a multifont  requirement,  the  break- 
even volume  is  approximately  three  times  that  for  a single 
font  requirement.  This  is  due  to  the  fact  that  the  multi- 
font OCR  device  is  significantly  more  expensive  than  the 
single  font  unit.  Therefore,  as  a general  rule,  OCR  direct 
read  may  offer  the  greatest  advantage  in  high  volume  situations. 
However,  for  low  volume  data  input,  key  entry  may  offer  the 
lowest  cost  based  on  the  parameters  analyzed  and  the  system 
constraints  selected.  This  is  especially  true  if  the  hardware 
costsexceed  the  labor  costs. 
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The  current  microform  system  contains  a degree  of  auto- 
mation in  that  the  index  data  is  captured  via  CRT  terminals 
in  an  on-line  environment.  Consequently,  the  incorporation 
of  any  SDA  technique(s)  applicable  to  the  microform  system 
should  provide  a capability  beyond  that  contained  in  the 
current  system.  The  SDA  techniques  considered  for  automating 
selected  functions  of  the  microform  system  were  found  to 
have  both  advantages  and  disadvantages. 

Concept  1 considered  automating  the  document  transfer 
function.  Instead  of  forwarding  the  source  documents  created 
at  the  CBPOs  to  the  AFMPC  via  the  postal  service,  the  docu- 
ments would  be  transmitted  electronically  in  a facsimile  type 
of  operation.  A facsimile  printer  located  at  AFMPC  would 
produce  a paper  document  output  which  would  be  processed  as 
in  the  current  system.  The  advantages  of  this  concept  are 
that  it  can  capture  the  signatures  on  the  documents  and  it 
eliminates  the  three  to  four  day  time  lag  which  is  character- 
istic of  the  postal  system.  However,  the  cost  of  data  commu- 
nications will  be  significantly  higher  than  the  cost  of 
document  transfer  in  the  current  system,  and  the  legibility 
of  the  small  characters  on  the  paper  document  which  identify 
the  various  data  fields  will  be  marginal. 

Concept  2 considered  the  use  of  an  OCR  device  at  the 
AFMPC  facility  for  the  capture  of  microform  index  data  in 
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conjunction  with  OCR  typewriters  with  a specialized  font 
for  source  document  creation  at  the  CBPOs.  The  processing 
speed  of  current  OCR  devices  is  such  that  only  one  device 
would  be  required  to  handle  the  daily  workload  of  approximately 
6,000  documents  per  day.  This  would  result  in  a significant 
reduction  in  the  seven  operators  now  performing  the  function 
of  index  data  capture  using  CRT  terminals  in  an  on-line  environ- 
ment. However,  the  chances  of  an  OCR  being  successful  as  a 
means  of  inputting  data  in  the  microform  system  is  dependent 
upon  a redesign  of  all  source  document  forms  to  facilitate 
an  OCR  device,  and  imposing  tight  restrictions  and  controls 
on  quality  during  forms  preparation.  The  greater  the  extent 
to  which  the  quality  conditions  are  met,  the  simpler  is  the 
machine  required  to  achieve  an  acceptable  reading  performance, 
and  excessive  equipment  cost  may  be  avoided.  The  bureaucratic 
inertia  associated  with  a forms  redesign  effort  for  DOD  forms 
as  well  as  Air  Force  forms  appears  as  large  and  complex  as 
the  technological  deficiencies  which  must  be  solved  if  indeed 
an  OCR  device  is  to  be  used  successfully  in  the  current  envi- 
ronment without  any  forms  redesign.  Current  assessment  is 
that  the  successful  utilization  of  OCR  equipment  as  a data 
entry  device  will  require  a forms  redesign  and  forms  control 
in  order  to  realize  an  acceptable  throughput  with  a minimum 
of  rejects  due  to  "no  reads,"  and  more  importantly,  a minimum 
of  substitute  errors. 
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Concept  3 and  Concept  4 represent  a radical  approach 
from  a source  data  automation  .systems  concept  in  that  both 
considered  the  elimination  of  the  paper  record  at  the  CBPOs 
and  MAJCOMS.  Both  concepts  have  the  two  distinct  disadvan- 
tages of  being  unable  to  capture  the  signatures  that  would 
ordinarily  be  on  many  of  the  original  paper  documents  and 
both  are  extremely  costly  to  implement  and  operate.  A CRT 
terminal  is  used  at  the  CBPOs  for  "source  document  creation." 
The  data  input  by  the  operator  is  then  transmitted  electron- 
ically using  a coded  digital  format  typical  of  alphanumeric 
data  transmission  rather  than  the  facsimile  format.  A sizable 
software  effort  is  required  to  generate  the  document  images 
at  the  CBPO.  The  software  effort  pertains  to  the  applications 
programs  required  to  duplicate  the  outline  of  the  multiplicity 
of  forms  currently  used  in  the  personnel  record,  in  order  that 
the  operator  can  enter  the  required  data  in  the  appropriate 
fields.  A type  of  forms  redesign  to  reduce  significantly  the 
number  of  different  types  of  documents  would  reduce  the  number 
of  formatted  screens  that  would  have  to  be  generated  for 
operator  entry  of  data. 

The  difference  between  Concept  3 and  Concept  4 is  that 
in  Concept  3 a coded  digital  copy  of  the  record  of  all  assigned 
personnel  is  substituted  in  place  of  the  paper  record  at  the 
CBPOs  and  MAJCOMS.  Therefore,  the  major  cost  factor  in 
Concept  3 is  the  cost  of  the  mass  storage  systems  required 


at  each  location.  In  Concept  4,  no  substitute  is  made  for 
the  paper  record.  The  data  requested  in  support  of  user 
activity  is  transmitted  electronically  from  AFMPC  to  the 
respective  CBPO  or  MAJCOM;  thus,  the  major  factor  in  this 
concept  is  the  monthly  data  communications. 

The  SDA  concepts  discussed  involve  either  a significant 
cost  impact  or  a significant  operational  impact.  The  cost 
impact  is  associated  with  Concepts  1,  3 and  4.  The  opera- 
tional impact  resulting  from  a required  forms  redesign  is 
associated  with  Concept  2.  The  impact  of  a forms  redesign 
effort  could  be  significantly  reduced  if  only  the  high-use 
Air  Force  documents  such  as  the  selection  folder  material 
are  redesigned.  This  would  keep  the  effort  within  the  Air 
Force  and  eliminate  the  requirement  to  involve  DOD  (and  the 
other  services)  in  the  redesign  decision  process.  It  is 
very  likely  that  the  volume  of  high-use  Air  Force  documents 
is  sufficient  to  justify  on  a cost-effective  basis  the  use 
of  a single  font  OCR  page  reader.  Although  this  concept 
appears  to  be  the  most  viable  of  the  concepts  discussed 
(assuming  a successful  forms  redesign),  a thorough  systems 
analysis  should  be  performed  to  establish  a more  definitive 
systems  concept  from  which  more  accurate  cost  estimates  can 
be  derived. 
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